Overview

Dataset statistics

Number of variables19
Number of observations31648
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.6 MiB
Average record size in memory152.0 B

Variable types

NUM14
CAT5

Reproduction

Analysis started2020-05-31 13:05:25.112592
Analysis finished2020-05-31 13:05:54.473255
Duration29.36 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

country has a high cardinality: 78 distinct values High cardinality
state has a high cardinality: 69 distinct values High cardinality
city has a high cardinality: 5905 distinct values High cardinality
10k is highly correlated with 5k and 9 other fieldsHigh correlation
5k is highly correlated with 10k and 8 other fieldsHigh correlation
20k is highly correlated with 5k and 9 other fieldsHigh correlation
half is highly correlated with 5k and 9 other fieldsHigh correlation
25k is highly correlated with 5k and 9 other fieldsHigh correlation
30k is highly correlated with 5k and 9 other fieldsHigh correlation
35k is highly correlated with 5k and 9 other fieldsHigh correlation
40k is highly correlated with 5k and 9 other fieldsHigh correlation
official is highly correlated with 5k and 9 other fieldsHigh correlation
pace is highly correlated with 5k and 9 other fieldsHigh correlation
overall is highly correlated with 10k and 9 other fieldsHigh correlation
genderdiv is highly correlated with overallHigh correlation
bib has unique values Unique

Variables

5k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count1433
Unique (%)4.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.22636710970294452
Minimum0.0
Maximum1.0000000000000002
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.1525252525
Q10.1877525253
median0.2184343434
Q30.2607323232
95-th percentile0.3207070707
Maximum1
Range1
Interquartile range (IQR)0.07297979798

Descriptive statistics

Standard deviation0.05322533827
Coefficient of variation (CV)0.2351284086
Kurtosis1.771341124
Mean0.2263671097
Median Absolute Deviation (MAD)0.03535353535
Skewness0.5653030277
Sum7164.066288
Variance0.002832936634
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2042929293810.3%
 
0.2036616162790.2%
 
0.1986111111780.2%
 
0.2061868687750.2%
 
0.2111111111740.2%
 
0.2064393939720.2%
 
0.1946969697720.2%
 
0.2011363636720.2%
 
0.2060606061710.2%
 
0.2108585859710.2%
 
Other values (1423)3090397.6%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0031565656571< 0.1%
 
0.0034090909094< 0.1%
 
0.0079545454552< 0.1%
 
0.0082070707071< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.5584595961< 0.1%
 
0.51868686871< 0.1%
 
0.51010101011< 0.1%
 
0.49532828281< 0.1%
 

10k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count2659
Unique (%)8.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3655919974301697
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.240497076
Q10.300229741
median0.3515037594
Q30.421679198
95-th percentile0.532059315
Maximum1
Range1
Interquartile range (IQR)0.121449457

Descriptive statistics

Standard deviation0.09095901505
Coefficient of variation (CV)0.2487992508
Kurtosis0.4535122883
Mean0.3655919974
Median Absolute Deviation (MAD)0.05931495405
Skewness0.5585924837
Sum11570.25553
Variance0.008273542419
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3316624896470.1%
 
0.337823726470.1%
 
0.3355263158420.1%
 
0.3279030911420.1%
 
0.3425229741410.1%
 
0.3367794486410.1%
 
0.3237259816410.1%
 
0.3264411028400.1%
 
0.3308270677400.1%
 
0.3430451128390.1%
 
Other values (2649)3122898.7%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.00522138683< 0.1%
 
0.0053258145361< 0.1%
 
0.0078320802011< 0.1%
 
0.010756056811< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.90016708441< 0.1%
 
0.81829573931< 0.1%
 
0.81672932331< 0.1%
 
0.81276106931< 0.1%
 

20k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5228
Unique (%)16.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3677107411271664
Minimum0.0
Maximum0.9999999999999999
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.231997344
Q10.2951925631
median0.3500664011
Q30.4252324037
95-th percentile0.5621912351
Maximum1
Range1
Interquartile range (IQR)0.1300398406

Descriptive statistics

Standard deviation0.1017784116
Coefficient of variation (CV)0.2767893352
Kurtosis0.596195049
Mean0.3677107411
Median Absolute Deviation (MAD)0.06332005312
Skewness0.7219371765
Sum11637.30954
Variance0.01035884507
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3358831341280.1%
 
0.3220717131270.1%
 
0.3330677291250.1%
 
0.3297211155250.1%
 
0.3213811421240.1%
 
0.3248339973240.1%
 
0.3191500664230.1%
 
0.3315803453230.1%
 
0.3360956175230.1%
 
0.3373173971230.1%
 
Other values (5218)3140399.2%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0025498007971< 0.1%
 
0.0026560424971< 0.1%
 
0.0046746347941< 0.1%
 
0.008233731742< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.93306772911< 0.1%
 
0.87022576361< 0.1%
 
0.85003984061< 0.1%
 
0.81529880481< 0.1%
 

half
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count5489
Unique (%)17.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.36841336781468886
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.2322140556
Q10.295610149
median0.3506343133
Q30.4261478051
95-th percentile0.5643198752
Maximum1
Range1
Interquartile range (IQR)0.1305376561

Descriptive statistics

Standard deviation0.1022286079
Coefficient of variation (CV)0.277483438
Kurtosis0.6026377855
Mean0.3684133678
Median Absolute Deviation (MAD)0.06322996375
Skewness0.7274735953
Sum11659.54626
Variance0.01045068828
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3219391865270.1%
 
0.3321083367260.1%
 
0.3191703584240.1%
 
0.3285843737230.1%
 
0.3411699557230.1%
 
0.3254128071230.1%
 
0.3215364478220.1%
 
0.3304470399220.1%
 
0.308195731220.1%
 
0.3506343133210.1%
 
Other values (5479)3141599.3%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0028695126862< 0.1%
 
0.0054873137331< 0.1%
 
0.0085078534032< 0.1%
 
0.010924285141< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.92650020141< 0.1%
 
0.87202980271< 0.1%
 
0.85174184451< 0.1%
 
0.82168747481< 0.1%
 

25k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count6556
Unique (%)20.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3609134973026622
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.2233368966
Q10.2862840227
median0.3414604062
Q30.4189622564
95-th percentile0.5666474796
Maximum1
Range1
Interquartile range (IQR)0.1326782337

Descriptive statistics

Standard deviation0.1051387952
Coefficient of variation (CV)0.2913130043
Kurtosis0.6657078364
Mean0.3609134973
Median Absolute Deviation (MAD)0.06405723214
Skewness0.7952161924
Sum11422.19036
Variance0.01105416625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3186415591250.1%
 
0.3468875915210.1%
 
0.2964394375210.1%
 
0.3176959132210.1%
 
0.3215196119210.1%
 
0.3583175726210.1%
 
0.2977551188200.1%
 
0.3101307458200.1%
 
0.30926733200.1%
 
0.3267412219190.1%
 
Other values (6546)3143999.3%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0023435572732< 0.1%
 
0.006783981581< 0.1%
 
0.0076885124582< 0.1%
 
0.011306635971< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97052051641< 0.1%
 
0.90235178031< 0.1%
 
0.87690157061< 0.1%
 
0.84195378671< 0.1%
 

30k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count7926
Unique (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.33583957374459794
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.2029303285
Q10.2627606039
median0.3158503423
Q30.3930547307
95-th percentile0.5379708061
Maximum1
Range1
Interquartile range (IQR)0.1302941268

Descriptive statistics

Standard deviation0.1028032319
Coefficient of variation (CV)0.3061081539
Kurtosis0.6447323519
Mean0.3358395737
Median Absolute Deviation (MAD)0.06226362017
Skewness0.8279680976
Sum10628.65083
Variance0.0105685045
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.280905198190.1%
 
0.2994405026180.1%
 
0.2796549245170.1%
 
0.343200075170.1%
 
0.3085674991170.1%
 
0.3118494671160.1%
 
0.3042853124160.1%
 
0.2863126309160.1%
 
0.2507736067160.1%
 
0.2835620292160.1%
 
Other values (7916)3148099.5%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0018128965712< 0.1%
 
0.0067202200481< 0.1%
 
0.0071265589351< 0.1%
 
0.0085331166191< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97130622321< 0.1%
 
0.83674553811< 0.1%
 
0.81352170791< 0.1%
 
0.79120432591< 0.1%
 

35k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count9342
Unique (%)29.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3367644873239581
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.1990783169
Q10.2606111387
median0.3153360739
Q30.3978686078
95-th percentile0.5464965568
Maximum1
Range1
Interquartile range (IQR)0.137257469

Descriptive statistics

Standard deviation0.1067676473
Coefficient of variation (CV)0.3170395079
Kurtosis0.5190601842
Mean0.3367644873
Median Absolute Deviation (MAD)0.06504150192
Skewness0.8164986246
Sum10657.92249
Variance0.01139933052
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2853551884170.1%
 
0.3112251578170.1%
 
0.2770286193160.1%
 
0.2647220549160.1%
 
0.295619386815< 0.1%
 
0.295226624115< 0.1%
 
0.301537011315< 0.1%
 
0.270822968815< 0.1%
 
0.262496399714< 0.1%
 
0.289806498914< 0.1%
 
Other values (9332)3149499.5%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.001073551362< 0.1%
 
0.0073577544451< 0.1%
 
0.0083265690871< 0.1%
 
0.0089288052162< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.97509884531< 0.1%
 
0.84883873161< 0.1%
 
0.81998376581< 0.1%
 
0.79935063231< 0.1%
 

40k
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10391
Unique (%)32.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3518576171314184
Minimum0.0
Maximum1.0
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.206794679
Q10.2718796992
median0.3293811452
Q30.4175130133
95-th percentile0.568189705
Maximum1
Range1
Interquartile range (IQR)0.1456333141

Descriptive statistics

Standard deviation0.1112207779
Coefficient of variation (CV)0.3160959789
Kurtosis0.3720835143
Mean0.3518576171
Median Absolute Deviation (MAD)0.06860613071
Skewness0.7709058553
Sum11135.58987
Variance0.01237006144
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.2995951417160.1%
 
0.29050318115< 0.1%
 
0.305772122615< 0.1%
 
0.30024291514< 0.1%
 
0.32858299613< 0.1%
 
0.289693464413< 0.1%
 
0.3026489313< 0.1%
 
0.297790630413< 0.1%
 
0.286338924213< 0.1%
 
0.330595720112< 0.1%
 
Other values (10381)3151199.6%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0013880855991< 0.1%
 
0.0014343551191< 0.1%
 
0.0086292654711< 0.1%
 
0.0088374783111< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.99958357431< 0.1%
 
0.87882012721< 0.1%
 
0.83393869291< 0.1%
 
0.82632735691< 0.1%
 

official
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10848
Unique (%)34.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.35208361312506
Minimum0.0
Maximum1.0000000000000002
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.206598586
Q10.2721043903
median0.3303220738
Q30.4180850135
95-th percentile0.5671489046
Maximum1
Range1
Interquartile range (IQR)0.1459806232

Descriptive statistics

Standard deviation0.1109437003
Coefficient of variation (CV)0.3151061173
Kurtosis0.3291582404
Mean0.3520836131
Median Absolute Deviation (MAD)0.06914986471
Skewness0.7517530927
Sum11142.74219
Variance0.01230850463
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.257375403714< 0.1%
 
0.314218381814< 0.1%
 
0.278803351714< 0.1%
 
0.340119577613< 0.1%
 
0.27009688413< 0.1%
 
0.311643536713< 0.1%
 
0.318429780912< 0.1%
 
0.274766518312< 0.1%
 
0.290477437412< 0.1%
 
0.299925809512< 0.1%
 
Other values (10838)3151999.6%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.001374705422< 0.1%
 
0.0088373919871< 0.1%
 
0.0089464955921< 0.1%
 
0.0092083442441< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.9898315441< 0.1%
 
0.87108318061< 0.1%
 
0.83195862791< 0.1%
 
0.82253207651< 0.1%
 

pace
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count702
Unique (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3520001462629808
Minimum0.0
Maximum0.9999999999999998
Zeros1
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.2069754145
Q10.2715837621
median0.3299028016
Q30.4173813608
95-th percentile0.5671812464
Maximum1
Range1
Interquartile range (IQR)0.1457975986

Descriptive statistics

Standard deviation0.1108735157
Coefficient of variation (CV)0.3149814478
Kurtosis0.3273589478
Mean0.3520001463
Median Absolute Deviation (MAD)0.06861063465
Skewness0.7513709293
Sum11140.10063
Variance0.01229293648
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3030303031620.5%
 
0.28130360211500.5%
 
0.29731275011500.5%
 
0.28587764441490.5%
 
0.28016009151490.5%
 
0.29959977131490.5%
 
0.33390508861480.5%
 
0.31389365351470.5%
 
0.27101200691460.5%
 
0.32532875931450.5%
 
Other values (692)3015395.3%
 
ValueCountFrequency (%) 
01< 0.1%
 
0.0011435105772< 0.1%
 
0.0085763293312< 0.1%
 
0.0097198399092< 0.1%
 
0.010863350492< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.98913664951< 0.1%
 
0.87021154951< 0.1%
 
0.83133218981< 0.1%
 
0.82275586051< 0.1%
 

overall
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count31595
Unique (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49586622792781354
Minimum0.0
Maximum0.9999999999999999
Zeros2
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.0478969621
Q10.2467507047
median0.4957250235
Q30.7448872534
95-th percentile0.9442107736
Maximum1
Range1
Interquartile range (IQR)0.4981365487

Descriptive statistics

Standard deviation0.2875779691
Coefficient of variation (CV)0.5799507062
Kurtosis-1.199877423
Mean0.4958662279
Median Absolute Deviation (MAD)0.249076104
Skewness0.001798552736
Sum15693.17438
Variance0.08270108829
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02< 0.1%
 
9.395552772e-052< 0.1%
 
0.00037582211092< 0.1%
 
0.00081428124022< 0.1%
 
0.0014406514252< 0.1%
 
0.0010961478232< 0.1%
 
0.00056373316632< 0.1%
 
3.131850924e-052< 0.1%
 
6.263701848e-052< 0.1%
 
0.00021922956472< 0.1%
 
Other values (31585)3162899.9%
 
ValueCountFrequency (%) 
02< 0.1%
 
3.131850924e-052< 0.1%
 
6.263701848e-052< 0.1%
 
9.395552772e-052< 0.1%
 
0.0001252740372< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.99996868151< 0.1%
 
0.9999373631< 0.1%
 
0.99984340751< 0.1%
 
0.99981208891< 0.1%
 

age
Real number (ℝ≥0)

Distinct count64
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.38705873738103264
Minimum0.0
Maximum0.9999999999999999
Zeros33
Zeros (%)0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.1111111111
Q10.2380952381
median0.380952381
Q30.5079365079
95-th percentile0.6825396825
Maximum1
Range1
Interquartile range (IQR)0.2698412698

Descriptive statistics

Standard deviation0.1793616581
Coefficient of variation (CV)0.4633964842
Kurtosis-0.56427208
Mean0.3870587374
Median Absolute Deviation (MAD)0.126984127
Skewness0.1670120483
Sum12249.63492
Variance0.03217060439
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.428571428611913.8%
 
0.444444444411493.6%
 
0.349206349210753.4%
 
0.460317460310533.3%
 
0.38095238110343.3%
 
0.507936507910043.2%
 
0.36507936519763.1%
 
0.39682539689613.0%
 
0.47619047629042.9%
 
0.52380952388972.8%
 
Other values (54)2140467.6%
 
ValueCountFrequency (%) 
0330.1%
 
0.01587301587410.1%
 
0.031746031751110.4%
 
0.047619047621710.5%
 
0.063492063492750.9%
 
ValueCountFrequency (%) 
15< 0.1%
 
0.98412698413< 0.1%
 
0.96825396833< 0.1%
 
0.95238095245< 0.1%
 
0.93650793656< 0.1%
 

gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size247.2 KiB
M
17484
F
14164
ValueCountFrequency (%) 
M1748455.2%
 
F1416444.8%
 

Length

Max length1
Median length1
Mean length1
Min length1

genderdiv
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count17490
Unique (%)55.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4553331856406589
Minimum0.0
Maximum1.0
Zeros4
Zeros (%)< 0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.04360703312
Q10.224137931
median0.4505519517
Q30.6766814612
95-th percentile0.9055223626
Maximum1
Range1
Interquartile range (IQR)0.4525435302

Descriptive statistics

Standard deviation0.269335098
Coefficient of variation (CV)0.59151212
Kurtosis-1.072844482
Mean0.4553331856
Median Absolute Deviation (MAD)0.2262717651
Skewness0.09915551468
Sum14410.38466
Variance0.07254139499
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
04< 0.1%
 
0.00034141345174< 0.1%
 
5.690224195e-054< 0.1%
 
0.00056902241954< 0.1%
 
0.00011380448394< 0.1%
 
0.00045521793564< 0.1%
 
0.00022760896784< 0.1%
 
0.00051212017754< 0.1%
 
0.00017070672584< 0.1%
 
0.00028451120974< 0.1%
 
Other values (17480)3160899.9%
 
ValueCountFrequency (%) 
04< 0.1%
 
5.690224195e-054< 0.1%
 
0.00011380448394< 0.1%
 
0.00017070672584< 0.1%
 
0.00022760896784< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.99994309781< 0.1%
 
0.99988619551< 0.1%
 
0.9997723911< 0.1%
 
0.99926027091< 0.1%
 

division
Real number (ℝ≥0)

Distinct count6921
Unique (%)21.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.27540847901001014
Minimum0.0
Maximum0.9999999999999999
Zeros23
Zeros (%)0.1%
Memory size247.2 KiB

Quantile statistics

Minimum0
5-th percentile0.01390083118
Q10.08684436801
median0.2030667813
Q30.3725995987
95-th percentile0.804335053
Maximum1
Range1
Interquartile range (IQR)0.2857552307

Descriptive statistics

Standard deviation0.244317063
Coefficient of variation (CV)0.8871079926
Kurtosis0.3354066435
Mean0.275408479
Median Absolute Deviation (MAD)0.1314130123
Skewness1.113716829
Sum8716.127544
Variance0.05969082726
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0230.1%
 
0.000286615076220.1%
 
0.000143307538220.1%
 
0.0005732301519200.1%
 
0.0008598452279200.1%
 
0.0004299226139200.1%
 
0.001003152766200.1%
 
0.00143307538190.1%
 
0.001289767842190.1%
 
0.0007165376899190.1%
 
Other values (6911)3144499.4%
 
ValueCountFrequency (%) 
0230.1%
 
0.000143307538220.1%
 
0.000286615076220.1%
 
0.0004299226139200.1%
 
0.0005732301519200.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.99971338491< 0.1%
 
0.99856692461< 0.1%
 
0.99842361711< 0.1%
 
0.99756377191< 0.1%
 

country
Categorical

HIGH CARDINALITY

Distinct count78
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size247.2 KiB
USA
26939
CAN
 
2164
GBR
 
341
ITA
 
209
MEX
 
202
Other values (73)
 
1793
ValueCountFrequency (%) 
USA2693985.1%
 
CAN21646.8%
 
GBR3411.1%
 
ITA2090.7%
 
MEX2020.6%
 
GER1800.6%
 
JPN1720.5%
 
AUS1230.4%
 
IRL1160.4%
 
FRA1130.4%
 
Other values (68)10893.4%
 

Length

Max length3
Median length3
Mean length3
Min length3

state
Categorical

HIGH CARDINALITY

Distinct count69
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size247.2 KiB
MA
7427
others
 
2545
CA
 
2302
NY
 
1537
ON
 
1045
Other values (64)
16792
ValueCountFrequency (%) 
MA742723.5%
 
others25458.0%
 
CA23027.3%
 
NY15374.9%
 
ON10453.3%
 
PA9973.2%
 
TX9883.1%
 
IL9112.9%
 
OH7542.4%
 
FL7452.4%
 
Other values (59)1239739.2%
 

Length

Max length6
Median length2
Mean length2.321663296
Min length2

city
Categorical

HIGH CARDINALITY

Distinct count5905
Unique (%)18.7%
Missing0
Missing (%)0.0%
Memory size247.2 KiB
Boston
 
1018
New York
 
497
Chicago
 
312
Cambridge
 
306
Toronto
 
239
Other values (5900)
29276
ValueCountFrequency (%) 
Boston10183.2%
 
New York4971.6%
 
Chicago3121.0%
 
Cambridge3061.0%
 
Toronto2390.8%
 
Somerville2390.8%
 
Brookline2190.7%
 
Washington2100.7%
 
Newton1950.6%
 
San Francisco1920.6%
 
Other values (5895)2822189.2%
 

Length

Max length35
Median length8
Mean length8.799892568
Min length2

bib
Categorical

UNIQUE

Distinct count31648
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size247.2 KiB
26385
 
1
12607
 
1
16839
 
1
8995
 
1
12146
 
1
Other values (31643)
31643
ValueCountFrequency (%) 
263851< 0.1%
 
126071< 0.1%
 
168391< 0.1%
 
89951< 0.1%
 
121461< 0.1%
 
205181< 0.1%
 
202051< 0.1%
 
273151< 0.1%
 
285241< 0.1%
 
276081< 0.1%
 
Other values (31638)31638> 99.9%
 

Length

Max length5
Median length5
Mean length4.693914307
Min length1

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

5k10k20khalf25k30k35k40kofficialpaceoverallagegendergenderdivdivisioncountrystatecitybib
00.0034090.0078320.0082340.0085080.0076890.0085330.0104210.0100170.0101470.0108630.0002190.460317M0.0003980.001003JPNothersFukuokaW1
10.1069440.1666670.1579280.1582760.1506040.1340290.1281720.1295550.1273240.1269300.0006260.238095F0.0000000.000000KENothersEldoretF1
20.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.365079M0.0000000.000000RSAothersPaarlW2
30.1066920.1665620.1579280.1582760.1506040.1340290.1281720.1315210.1308810.1309320.0008140.095238F0.0001140.000287ETHothersShoaF2
40.0034090.0052210.0025500.0028700.0023440.0018130.0010740.0014340.0013750.0011440.0000310.349206M0.0000570.000143JPNothersNogata FukuokaW3
50.1069440.1666670.1579280.1582760.1506870.1340290.1281720.1315210.1310990.1309320.0008460.174603F0.0001710.000430KENothersNandiF3
60.0079550.0107560.0082340.0085080.0076890.0067200.0073580.0086290.0088370.0085760.0000940.158730M0.0001710.000430SUIothersNeuenkirchW4
70.0936870.1447370.1355640.1350180.1280730.1127430.1048680.1080390.1074230.1080620.0001250.174603M0.0002280.000573ETHothersAddis Ababa5
80.0031570.0052210.0026560.0028700.0023440.0018130.0010740.0013880.0013750.0011440.0000630.396825M0.0001140.000287JPNothersIsahayaW6
90.0934340.1447370.1364140.1369310.1310750.1198070.1174360.1243030.1248800.1252140.0005950.206349M0.0010810.002723USACARedding6

Last rows

5k10k20khalf25k30k35k40kofficialpaceoverallagegendergenderdivdivisioncountrystatecitybib
316380.3162880.5384290.5833730.5863370.5860130.5403680.5333190.5525270.5482020.5483130.9281870.206349M0.9411630.802952USAMADorchester35901
316390.3329550.5525270.5766270.5788860.5951400.5664680.5783560.6027070.6013790.6014870.9654240.603175M0.9725160.247062USAMAReading35902
316400.3011360.4942560.4904650.4906870.4767290.4500980.4262260.4601970.4544170.4545450.8134670.238095M0.8381130.715391USAMAHyde Park35905
316410.2786620.4877820.5011950.4981370.4917360.4753540.4692990.4870330.4851400.4854200.8609150.301587M0.8803350.747492USAMABoston35906
316420.3705810.6279240.6782470.6817360.6753970.6287310.6235760.6387040.6407000.6403660.9801130.412698M0.9836690.372170USAMAWayland35907
316430.2320710.3560990.3370520.3360350.3216020.2881880.2814280.2871490.2861790.2858780.3084250.222222M0.4267100.489109USACALarkspur35908
316440.2944440.4663740.4901460.4924490.4901740.4594130.4641930.4843720.4842450.4842770.8598500.253968M0.8795950.746776USAMANorwell35909
316450.2579550.4425650.4633730.4658680.4572810.4257180.4247070.4400230.4394690.4396800.7858130.047619F0.6138040.749498USACTWest Simsbury35910
316460.2933080.4921680.4982740.4983890.5016860.4721970.4706870.4852980.4847040.4842770.8604760.317460F0.6832820.831040USAMANorth Andover35911
316470.2420450.3864870.3820980.3831050.3668280.3293850.3201280.3247660.3222920.3224700.4648610.571429M0.5559350.199914USAPALancaster35912